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Abstract 

For probability measures on a complete separable metric space, we present sufficient 
conditions for the existence of a solution to the Kantorovich transportation problem. We 
also obtain sufficient conditions (which sometimes also become necessary) for the conver- 
gence, in transportation, of probability measures when the cost function is continuous, 
non-decreasing and depends on the distance. As an application, the CLT in the trans- 
portation distance is proved for independent and some dependent stationary sequences. 

Keywords: Kantorovich transportation problem, convergence in transportation distance, 
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1 Introduction 

Let (M, d) be a metric space and let c : M x M — > R, be a non-negative Borel function. The 
transportation c-distance T c (fi,v) between two probability measures \x and v defined on the 
Borel cr-field B(M) is given via 

T e (n,u) = infEc(X,K). 

Above, the infimum is taken over all M-valued random elements X and Y defined on the 
probability space (Q, J 7 , P) and having, respectively, \x and v for probability distribution. In 
other words, 



T c (fj,,v) =inf j c(x,y)d-n-(x,y), (1) 

where the infimum is taken over the set II of all probability measures on B(M x M) with 
marginals fi and v. The transportation distance is related to the celebrated Kantorovich 
transportation problem: if jj, and v are two distributions of mass and if c(x, y) represents the 
cost of transporting a unit of mass from the location x to the location y, what is the minimal 
total transportation cost to transfer [i to vl The minimal total transportation cost is exactly 
the transportation distance corresponding to the cost function c. 
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The c-transportation distance with c(x,y) = d p (x,y), p > 1, is associated to the Wasser- 
stein or Mallows p-distance W p , W p (p,u) = (T^ P (/i, v)) l l p . If M is the real line R with the 
Euclidean distance, the Wasserstein-Mallows p-distance between two distribution functions 
F and G has the following useful representation 

W*(F,G) = ^ \F~ l (t) - G-\t)\Pdt, (2) 
Jo 

where the inverse transformation of F is defined as 

F -1 (£) = sup{x 6 R : F(x) < t}. 

The representation (2) was obtained when p = 1 by Salvemini [201 (f° r discrete distributions) 
and by DalPAglio 7 (in the general case), while for p = 2 it is due to Mallows It 
implies that the random variables X = F (U) and Y = G~ l {U), where U is a uniform 
random variable on (0, 1), are minimizers of the total transportation cost in the transportation 
problem. Major |14j generalized (2) to a convex cost function c(x,y) = c(x — y): 

T C (F,G) = C c{F-\t)-G-\t))dt. 
Jo 

The representation (2) is an important tool in proving the following convergence result. 

Let p = 1, 2 and let F n , F be distribution functions on R such that for any n, J \x\ p dF n < 
+oo, and f \x\ p dF < +oo. Then 

For p = 1 the equivalence (3) was proved by Dobrushin while for p = 2 it is due to 
Mallows [Td] . 

Bickel and Freedman [2] extended the statement (3) to probability measures p, n and p, 
defined on a separable Banach space (B, || • ||) and to all p € [1, +oo) as follows: 

Let 1 < p < oo, and let J \\x\\ p fi n (dx) < oo, J \\x\\ p p(dx) < oo. Then W p (p n ,fi) — > as 
n — > oo is equivalent to each of the following. 

(a) p n =^ n and J \\x\\ p p n (dx) — > J \\x\\ p fi(dx) . 

(b) n n =^ p and || • || p is uniformly // n -integrable. 

(c) J 4>(x) n n (dx) — > J <p(x)p(dx) for every continuous <fi such that (p(x) = 0(||x|| p ) at 
infinity. 

Since in general an analog of the representation (2) does not exist for probability measures 
on a Banach space, Bickel and Freedman proved, in their setting, the existence of a solution 
to the transportation problem for c(x,y) = \\x — y\\ p . 

Recently, Ambrosio, Gigli, and Savare proved (Tj, Proposition 7.1.5) an analog of part 
(b) of the above result for probability measures on a Radon space X (see also Lemma 5.1.7 
and Remark 7.1.11 there). These authors also established the existence of a solution to the 
Kantorovich transportation problem in X for a wide class of cost functions. We use this 
existence result to prove criteria for the convergence in T c with c(x, y) = C(d(x, y)), where C 
is a non-decreasing continuous function satisfying the doubling condition (6) which controls 
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the rate of growth of C (Theorem 2 and Corollary 1). Since the class of such cost functions 
includes all the d p s, p > 1, the convergence results of Bickel and Freedman as well as those of 
Ambrosio, Gigli, and Savare follow from Corollary 1. Note that instead of the Radon space 
X (a separable metric space, where, by definition, every probability measure is tight), we 
consider more familiar in the theory of probability object, a complete separable space (M, d) 
where completeness and separability together provide the tightness of a probability measure; 
all our arguments remain true for a Radon space (see also Remark 1). 

In Theorem 2 we also obtain sufficient conditions for the convergence of probability mea- 
sures in the transportation distance without assuming the doubling condition on C. For 
instance, any convex C : R + — > R + with C(0) =0 satisfies Theorem 2. We then provide an 
example of a C growing exponentially fast for which the converse implication does not hold. 

2 Convergence in Transportation Distance 

The following result of Ambrosio, Gigli, and Savare |X| asserts the existence in II of a probabil- 
ity measure which minimizes the total transportation cost under rather weak assumptions on 
the cost function. For the sake of completeness, we include a self-contained proof in Section 
4. 

Theorem 1. Let (M,d) be a complete separable metric space, and let T c (/i,v) be defined by 
(1) with c : M x M — > [0, +oo) lower semicontinuous. 

Then there exists ir* S II such that J c(x,y)dir* (x,y) = T c (fi,u). Or, equivalently, there 
exists a pair of random elements X and Y with respective distributions [i and u, such that 



Remark 1. In the corresponding statement in the space (M,d) need not be a complete 
separable metric space but just a Radon space. In fact, our proof also shows that completeness 
is unnecessary and that the tightness of /i and v will suffice. On the other hand, the hypothesis 
of separability of (M, d) can be weakened to the topological separability if both /x and v have 
separable supports (see Billingsley [3], Appendix III). 

The Kantorovich problem is closely related to the Monge transportation problem which 
is the problem of finding a map s* pushing n forward to v (i.e. such that v{B) = fi(s~ 1 (B)) 
for any Borel set B) and minimizing the total transportation cost: inf s J c(x, s(x))d/j, = 
J c(x,s*(x))dfi, where the infimum is taken over all Borel maps s pushing fi forward to 
v. A solution s* to the Monge transportation problem uniquely determines a probability 
measure ir'onMxM such that the random elements X and Y,Y = s*(X), with respective 
distributions \x and v have joint law ir* . This measure ir* minimizes the Monge transportation 



where the infimum is taken over the set II* of joint distributions of M- valued random elements 
X and Y with respective distributions \i and v and such that Y is measurable with respect to 
the Borel field o~(X). Comparing (4) and (1) yields the relation II* C II, which immediately 
leads to the following conclusions: (i) the (Kantorovich) transportation distance T c (/i,z/) 



Bc(X,Y)=T c ( f i,u). 



cost: 




(4) 
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never exceeds the Monge transportation distance T c (/j,, i>), 

r c (/i, v) = inf J c(x,y)dir(x,y) = inf J c(x,s(x))dfi; 

(ii) a probability measure ir* corresponding to the solution s* of the Monge transportation 
problem (MTP) is not necessarily a solution to the Kantorovich transportation problem 
(KTP); conversely, a solution ir' of the KTP, where ir' is the joint distribution of X and Y, 
is a solution to the MTP if and only if there exists a Borel map s' such that Y = s'(X). 

For random elements X and Y in a Hilbert space Cuesta and Matran [5] have provided 
conditions for the existence of an increasing map s, s(X) = Y, such that W^il-i, v) = E||X — 
s(X)|| 2 , i.e. X and Y = s(X) give the solution to both the MTP and the KTP. They also 
showed that if \x is either absolutely continuous with respect to the Lebesgue measure on 
R fc or is a Gaussian measure on a Hilbert space, then these conditions are satisfied. For 
compactly supported absolutely continuous distributions on R fc and a convex cost function 
c(x — y) Caffarelli jH] has determined the form of the optimal map (the solution to the MTP) 
as a gradient of c; the uniqueness of the solution is also obtained there. Simultaneously, 
Gangbo and McCann proved the same results for non-necessarily boundedly supported 
probability measures. They also showed that the solution to the MTP is the KTP solution as 
well, and that a similar result holds true for c(x,y) = l(\\x — y\\), where I is strictly concave. 
Note that in all the existence statements mentioned above, the conditions of Theorem 1 are 
satisfied. A comprehensive review of the results on the solutions to the KTP and the MTP 
can be found in the books of Rachev and Ruschendorf 19 . 

The main result of the work presented here is now given. 

Theorem 2. Let fi n and fi be probability measures on a complete separable metric space 
(M, d) and letc: MxM — > R be such that c(x, y) = C(d(x, y)), where C : [0, +oo) — > [0, +oo) 
is a non- decreasing continuous function with C(0) = 0. Let 



C(2d(x, a))fj, n (dx) < oo, / C(2d(x, a))fi(dx) < oo (5) 

for some (and, therefore, for all) a £ M. Then 
(a) fi n =>- fi, 

(6) / C(2d(x,a))fi n (dx) -» / C(2d(x,a))n(dx) 



T c ([in,fi) -> 0. 



Conversely, ifT c (fj, n ,fi) — > 0, then fi n =^ fi. If, additionally, C satisfies a doubling condition, 
i.e. if there exists a positive constant A such that for all y > 



C(2y) < XC(y), (6) 



then 



> I (a) => ^, 

> \{b) J C(2d{x,a))fM n (dx) ->• / C(2d(x,a))n(dx). 
Corollary 1. //, in the setting of Theorem 2, C satisfies (6), then 
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(6) / C(d(x,a))fj, n (dx) < oo, / C (d(x , a)) fi(dx) < oo, 

and thus 



T c {n n ,n) -> 



(a) fi n => fl, 

{b') fC{d(x,a))fi n (dx) fC(d(x,a))n(dx). 



Corollary 1 is equivalent to a result of Rachev (Theorem 1 in ^Sj) proved by using the 
relations between the Levy-Prokhorov metric and the T c -distance. Since for any p > 1, the 
function c(x,y) = d p (x,y) satisfies the conditions of Theorem 2 as well as (6) with A = 2 P , 
Corollary 1 recovers part (a) in the result of Bickel and Freedman mentioned above. Ambrosio, 
Gigli, and Savare proved an analog of Theorem 2 in a Hilbert space when cost function is 
continuous, strictly increasing and surjective map. 

Note that the class of functions C covered by Theorem 2 includes functions with a faster 
than polynomial rate of growth at infinity (e.g. C(d(x,y)) = exp(d(x,y)) — 1). For functions 
C growing exponentially fast at infinity, and in contrast to C(d(x,y)) = d p (x,y), T c (^ n ,/x) — > 
need not imply the convergence of J C(2d(x,a))fJ, n (dx) to J C(2d(x, a))fj,(dx), for some 
a 6 M. Indeed, one can take the probability measures fj, n and /ionR defined in Example 1, 
below, and c(x, y) = C(\x — y\) = exp(|x — y\) — 1. 

As a corollary to Theorem 2 we obtain the following result relating the convergence in 
total variation to the convergence in transportation distance. It is well known that the 
total variation distance itself is a particular case of transportation distance (with c(x, y) = 
21 {x ^ y} ). 

Corollary 2. Let fi and v be boundedly supported probability measures on a complete sepa- 
rable metric space (M,d), and let 4> be a continuous function on (M,d). Then 

(j){x)n{dx) — I 4>{x)v(dx) 



< LfWfj, - u\\tv 

for some positive constant L^. 

Let \i n be probability measures on M with respective supports K n , n > 1. Let U n K n be 
bounded. If c{x,y) = C(d(x,y)), where C : [0, +oo) — ► [0, +oo) is non- decreasing, continuous 
with C(0) = 0, then 

||Mn - v\\tv -» =>■ T c (fj, m fj,) -> 0. 

Without the boundedness restriction on L)K n the last implication is not true, as the 
following example shows. 

Example 1. Let \x be the uniform distribution on (0, 1) and, for all n G N, let 

n — 1 1 

Hn(dx) = l (01) (x)dx + -6 Xn (dx), 

x n i (0,1). Then 

f 1 2 

Wlbi - VWTV = / |/„(x) - l\dx + Hn{x n ) = >0 

Jo n 
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TV 

as n — ► oo. Hence /i n — ► \i for any choice of the sequence (x n ). Let c(x,y) = C(\x — y\), 
with C : [0, +00) — > [0, +00), C(0) = 0, convex, also satisfying (6). Then, 



C(|a;|)/i(dx) 



/ Cflarhda; < max C(|x|) < +00, 

JO 0<|x|<l 



/ 



C(\x\)fi n (dx) = f — C7(|x|)dx+M n ( a ; n )C(|z n |) < max — < +00, 

J n o<|x|<i n n 

for any n. So all the conditions of Corollary 1 are satisfied. Since weak convergence is 
implied by convergence in total variation, T c (fi n , //) — * holds if and only if J C (\x\) (i n (dx) — * 
J C(\x\)fi(dx). Take x n = 2 n , then C{\x n \) = C{2 n ) > 2"- 1 C(2) and C(\x n \)/n -» +00 as 
n — > 00. Therefore, 

y C(|x|)^ n (dx) > _^ +oc ^ J C {\x\)n(dx). 

By Corollary 1, T c (fi n , fi) does not converge to 0. 



3 Applications to the Central Limit Theorem 

Next, we apply Theorem 2 to obtain the CLT in the transportation distance. We provide 
sufficient conditions for the convergence of the laws of the normalized sums to the standard 
Gaussian measure on R for stationary sequences which are either independent, strongly 
mixing or associated. 



3.1 Independent sequences 

Let (X n ) be a sequence of independent identically distributed random variables, EXi = 0, 
EJj = a 2 , < a < +00. Let S n = Xjb=i -^-k- Then by the classical central limit theorem 

_^z~ JV(0,1). 

oyn 

Let n n denote the probability law of the normalized sum S n /(<jy/n), and let 7 be the 
standard Gaussian measure on R. We find additional conditions on the sequence (X n ) and 
on the cost function to obtain the convergence of fi n to 7 in the T c -distance. 

Theorem 3. Let c(x,y) = C(\x — y\), where C : [0, +00) — > [0, +00) is a non- decreasing 
continuous function with C(0) = 0. 

(i) If there exists p > 2 such that C{x) = 0(x p ) at infinity and E|Xi| p < +00, then 
T c (fj, n ,i) ->• 0. 

(ii) Otherwise, let EC(4^/2\Z\) < +00 and let J2T=i k k EX 2k < +00, then T c (fj, n ,j) 0. 
The CLT in the W2-distance was proved by Tanaka |2J for distributions on R and 

by Cuesta and Matran |Hj for distributions on a Hilbert space; both results require the 
finiteness of the fourth moment. Very recently, Johnson and Samworth ^2]) H3] proved that 
W p (// n ,7) — > 0, p > 2, under the condition E|Xi| p < 00. This statement is a particular case 
of part (i) of Theorem 3. However, these authors also proved the convergence to an a-stable 
law in the Wasserstein-Mallows a-distance, a G (0,2). 
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We will prove Theorem 3 by applying Theorem 2. The CLT yields weak convergence; 
therefore, to prove convergence in T c -distance, we need to verify the convergence of f C{2\x\)d^, 
to J C(2|a;|)d7. To do so, we use Rosenthal's inequality which asserts that for stationary in- 
dependent sequence (X n ) of centered random variables 

V\S n \ p < K(p) (nE\Xi\ p + n p /\nXi\ 2 ) p/2 ) (7) 
for p > 1 and a positive constant K(p) depending only on p (Petrov jl7|). 

3.2 Strong mixing sequences 

The coefficients a n of strong mixing of a random sequence (X n ) are defined as 
a n = snp{\P(A n B) - P(A)P(B)\ : A € J? , Be k > 1}, 

where F k +rn [ s the cr-field generated by the random variables X^^it+i) ■■■■>X}~ +m . A sequence 
is said to satisfy a strong mixing condition if a n — > as n — > +oo. 

A CLT for a stationary strong mixing sequence (X n ) was proved by Denker [S] in the 
following form. Let EXi = 0, EXf = a 2 , < a < +oo, and a 2 , = ES 2 = nh(n), where 
h(n) is a slowly varying function. Let S n = X^fc=i -^k, and let /i„ be the law of S n /a n . Then 
[i n 7, where 7 is the standard Gaussian measure on R. 

To obtain the convergence of fj, n to 7 in T c , we need additional conditions on the rate of 
decay of the coefficient a n providing a moment inequality for sums. Such a result exists, it 
is due to Yokoyama |22| and asserts that if (X n ) is a stationary strong mixing sequence such 
that EXl = 0, F,\X 1 \p +s < +00, p > 2, 6 > and 

00 

^(n + l)f- x (a„)^ <+oo, (8) 

n=l 

then 

E|5 n | p <K(p)nf, (9) 
where the positive constant if (p) depends only on p. 

Theorem 4. Ze£ c(x,y) = C(\x — y\), where C : [0, +00) — * [0, +00) is a non- decreasing 
continuous function with C(0) = 0. 

(i) If there exist p > 2 and 5 > such that the condition (8) is satisfied and C(x) = 0{x p ) 
at infinity, then T c (// n ,7) — ► 0. 

(ii) Otherwise, let ~EC(A^/2\Z\) < +00, let Y.k=i kk ^ x l k < +°°> and let ( 8 ) be satisfied 
for all p > 2, then T c (^ n ,7) — > 0. 

To prove Theorem 4, we once again apply Theorem 2. Since the corresponding CLT 
implies weak convergence to the Gaussian measure, it is sufficient to show the convergence 
of EC(2|Y„|) to BC(2\Z\). 
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3.3 Associated sequence 



u(n) 

k>l 



Recall that a set of random variables £ = (£i,.-.,£m) is called associated if for any two 
coordinatewise increasing functions /, g : R m — > R such that E/(£)g(£), E/(£) and E<?(£) 
exist, 

Cov(/(0,5(0)>0. 

An infinite set of random variables is associated if all of its finite subsets are associated. 

Newman ^H] proved the CLT for associated sequence under the following conditions. Let 
(X n ) be a stationary associated sequence, EXj = 0, EXf = a 2 , < a < +oo, a 2 = ES 2 
nil in ) . where h(n) is a slowly varying function. Let \i n be the law of S n /a n and let 7 be the 
standard Gaussian measure on R. Then [i n =>- 7. 

Asymptotic independence for associated sequence (X n ) is usually stated in terms of the 
Cox-Grimmett coefficient u(n) defined by: 

sup ^2 Cov(Xj,X k ). 

' j:\j-k\>n 

For a stationary sequence the Cox-Grimmett coefficient is just the tail of the series of covari- 
ances: 

00 

u{n) = 2 Cov(A"i, X k ). 

k=n+l 

To prove the convergence of fx n to 7 in the transportation distance, we use a condition 
on the rate of decay of the Cox-Grimmett coefficient. This condition implies the following 
moment inequality for sums (Birkel [1]). If (X n ) is a stationary associated sequence, EX\ = 0, 
E|Xi|p+ 5 < +00, p > 2, 5 > and 

(p-2)(p+8) 

u{n)<Bn & (10) 

then 

E|5 n | p <K(p)n%, (11) 
where the positive constant K(p) depends only on p. 

Theorem 5. Let c(x,y) = C(\x — y\), where C : [0, +00) — * [0, +00) is a non- decreasing 
continuous function with C(0) = 0. 

(i) If there exist p > 2 and S > such that the condition (10) is satisfied and C(x) = 0{x p ) 
at infinity, then T c (/z n ,7) — > 0. 

(ii) Otherwise, let EC{Ay/2\Z\) < +00, /ei Efcli k k ~EX 2k < +00, and let (10) 6e satisfied 
for all p > 2, then T c (}/, n , 7) — ► 0. 



4 Proofs 

Proof of Theorem 1. We first show that IT is a tight set. Indeed, for any positive e there 
exist compact sets K\, K2 G B(M), such that /it(-Ki) > 1 — | and viK?) > 1 — f • Let 7r S IT 
and let (X, Y) be a random vector with law 7r. Then, 

7r(Xi x X 2 ) = P(X €K U Y€ K 2 ) = P(X £ K x ) + P(Y G K 2 ) - P((X G X x ) U (Y G X 2 )) 
> n(K x ) + v(K 2 ) - 1 > (1 - e/2) + (1 - e/2) - 1 = 1 - e. 

(12) 
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Since (12) holds for all tt G II with the same compact set K\ x K2, this proves that II is 
tight. Therefore, according to Prokhorov's theorem (Billingsley Section 5), II is relatively 
compact. 

If T c (fi, v) = +00, then f c(x, y)dTr(x, y) = +00, for all tt G II and tt* can be chosen to be 
any probability measure from II. 

If T c (ii, v) < +00, then there exists a sequence TT n from II such that 

J c(x,y)d7r n (x,y) -^T c (fi,u). (13) 

On the other hand, the relative compactness of II implies the existence of a subsequence ir nk 
which converges weakly to some probability measure tt on B{M x M). Let us verify that tt is 
the measure tt* we are looking for. First, we want to prove that tt G II, i.e. that the marginal 
distributions of tt are \i and v, respectively. 

Let (i\ and v\ be marginals of tt. We will check that ^i(B) = fi(B), for any B G B{M) 
such that m(dB) = 0. Indeed, since d(B x M) C (dB xM)U(Bx dM) =dBx M (Billingsley 
0, (2.8)), we have 

tt(8(B x M)) < tt(8B x M) = fii(dB) = 0. 

Therefore, the weak convergence TT nk =^ tt implies that TT nk (B x M) — > tt(B x M), and we 
obtain 

H(B) = Tr nk (B x M) -» vr(S x M) = ^(5). 

Similarly, we can show that z^i(-B) = v(B), for any i? S B(M) such that ui{dB) = 0. Finally, 
it remains to check that two probability measures /Ui and /u (respectively z^i and v) are 
the same if they coincide on the Borel sets having a boundary of /Ui-measure (respectively 
z^i-measure) zero. 

Let D G 0(M) be a closed set. For e > 0, let D e = {x G M : d(x, D) < e} and let 
T> = {D e ,0 < e < 1}. Then there exists at most a countable number of e^, < < 1, such 
that sets D £k have a boundary of positive /^i-measure. We remove the sets D £k from T>, and 
obtain 

V° = {D £ , < e < 1, niidD 6 ) = 0}. 

We can then choose a decreasing sequence e n — ► 0, < e n < 1, with D n = D £ " G D°. The sets 
D n are such that: (a) D n+ \ C D n for all n; (b) P| n D n = D U = D; (c) fii(D n ) = fi(D n ). 
The properties (a)-(c) yield 

Mi CD) = Ml(n^) = lim ^lPn) = I™ Mpn) = MP)- 

1 1 n— >oo n— >oo 

n 

Therefore, the measures //1 and \x coincide on all the closed subsets of M. Since B(M) is 
generated by such sets, we conclude that fii = fi. Similar arguments lead to v\ = v. We have 
proved that the probability measure tt has respective marginals [i and v, i.e. tt G II. 

Next, we will check that J c(x,y)dTr(x,y) = T c (fj,, v). Since c is lower semicontinuous, for 
any real b the set {(x, y) : c(x, y) > b} is open (0, Appendix I). Let A = {(x, y) : c(x, y) > 0}. 
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/• 



Then the weak convergence 7r Hk =>■ it and (13) imply that 

c(x,y)dir(x,y) = J^c(x,y)dir(x,y) 

< lim inf / c(x,y)dir nk (x,y) 

n k J A 

= lim inf / c(x, y)dir nk (x, y) 
n k J 

= T c (fi,u). 

Since n £ II, the reverse inequality J c(x,y)dir(x,y) > T c (fj,,u) holds true. We thus conclude 
that J c(x,y)dir(x,y) = T c (fj,,u). In other words, the transportation distance becomes the 
total transportation cost associated to the measure ir. Finally, we set n* = ir and the proof 
is now complete. □ 



Proof of Theorem 2 and Corollary 1. Assume that both (a) and (b) are satisfied. Let 
X, X n be random elements with respective distributions fi and fj, n and such that X and 
X n are independent, for any n. Then (C(2d(X n , a))) is uniformly bounded, that is I\ = 
sup n EC(2eZ(AT n ,a)) < oo. Set I 2 = EC(2d(X,a)) < oo. 

Fix e > and choose a compact set K\ in B(M) such that \i(8K\) = and 



/ 



C(2d(x,a))d[i(x) < e. 



The weak convergence \i n =>• \i implies the tightness of the family (/J. n , ^)n>i, thus there 
exists a compact set K 2 £ B(M) such that /j, n (K2) c < £, ^(K 2 ) c < e and [1(8X2) = 0. Let 
A" = U K 2 . Then AT is compact, and 



C(2d(x,a))dfi(x) < e, 



(14) 



K c 



li n (K c )<e, v(K c )<e, 



(15) 



with also jJ.(8K) = 0, since jJ.(8K) < /j,(8Ki) + 11(8X2). Since (b) holds, we can choose a 
positive integer N\ such that for any n > N\, 



j C(2d(x,a))dn n (x) - J C(2d(x,a))d[x(x) 



< e. 



(16) 



As X n — ► X, for the chosen compact set K and the continuous function C(2d(-, a)) we have 

BC(2d(X n ,a))l {Xn€K} - EC(2d(X,a))l {Xe ^ } . 
Hence, we can choose a positive integer A 2 such that, for any n> N 2 , 



/ C*(2eZ(z,a))d// n (x) - / C(2d(x, 

JK JK 



a))dji(x) 



< e. 



(17) 
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Then for n > max{N 1 ,N 2 }, the estimates (14), (16) and (17) yield 

f C(2d(x,a))dnn(x)\ 

< J C(2d(x,a))dn n {x) - J C(2d(x,a))dfi(x 

C(2d(x,a))dn n (x) - / C{2d(x,a))d^{. 
Jk 



(18) 



C(2d(x,a))d[j,(x) 



< 3e. 



The weak convergence X n — —> X implies that C(2d(X n , X))l{x n eK, xeK} ~~ 0. The con- 
tinuous function C(d(x,y)) is bounded on the compact set K x K, therefore 

BC(d(X n ,X))l {Xn£K> XeK} ^0. 

This means that there exists a positive integer N3 such that, for any n> N3, 



f f C(d(X n ,X))dTT n (x,y) 
Jk Jk 



<£, 



(19) 



where n n is the joint distribution of X n and X. 
Since C is a non-negative and non-decreasing, 

C(d(x,y)) < C(d(x,a) + d(y,a)) < C(2max{d(x,a),d(y,a)}) 
< C(2d(x,a)) + C{2d(y,a)), 



(20) 



for all x, y G M. 

Using the inequalities (14), (15), (18), (20), and the independence of X n and X, we have: 



/ / C(d(x,y))dir n (x,y) =~EC(d(X n ,X))l {XneK c^ X€K c } 
Jk c Jk c 

< BC(2d(X n , a))l{ Xn( z K c}l{ X( z K c} + EC(2d(X,a))l{ X £ K cyl{ XneK cy 

< ^ C(2d(x,a))dn n (xfj fi(K c ) + C(2d(x,a))dfi(x)^ fi n (K c ) 

< 3e 2 + e 2 , 

for all n > max{iV"i, AT 2 }. Similarly, 

/ / C(d(x,y))dir n (x,y) = BC(d(X n ,X))l {Xn£K , x&K-} 
Jk Jk c 

< BC(2d{X n ,a))l {XneK} l {XeK c } +BC{2d(X, a))l {XeK c } l {XneK} 

< hn{K c ) + efi n (K) 

< he + e, 



(21) 



(22) 



and 



/ / C(d(x,y))dir n (x,y) = EC(d(X n , X))l {XnEKCj X£K} 
Jk c Jk 

< EC(2d(X n ,a))l {XneK c } l {XeK} + -EC(2d(X,a))l {XeK} l {XneK c } 

< Zen{K) + hHn{K c ) < 3e + I 2 e, 



(23) 
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for n > max{iVi, N 2 }. 

Thus for n > max{N 1 ,N 2 ,N 3 } the inequalities (19), (21)-(23) yield 

T e (n n ,n)<EC(d(X n ,X)) = J J C(d(x,y))dTr n (x,y) 



C(d(x,y))dir n (x,y) + / C(d(x,y))dir n (x,y) 

K J K JK c JK c 



+ 11 C(d(x,y))dTT n (x,y) + / C(d(x,y))dn n (x,y) 

JKJK C JK c Jk 

< e{6 + 4e + h + I 2 ). 

We conclude that (a) and (b) imply T c ([i n , //) — > 0. 

Next, we assume that T c (fj, n ,fj,) — > and verify that (a) |U n ==>■ /i takes place. According 
to Theorem 1, for any n there exists a pair of random elements X n and X with distributions 
fj, n and /x, respectively, which are minimizers of the total transportation cost: T c (// n ,/x) = 
~EC(d(X n , X)). Let us note that X may depend on n, so each time it appears in this proof, 
we assume that X = X^ n \ (Of course all the X^ have the same law fi.) 

Since C is a non-neg ative function, EC(d(X n ,X)) -> 0, that is C(d(X n ,X)) -=U 0. This 
implies that 

C(d(X n ,X))^0. (24) 

Fix e > 0. As C is non-decreasing, we have 

{d(X n ,X) >e}c {C(d(X n ,X)) > C{e)}. 

The convergence result (24) means that the probability of the last event tends to 0, for 
any positive C(e), as n — > 00. Hence for any e > 0, P(d(X n ,X) > e) — > 0, as n — > 00. 

The convergence, in probability, of d(A" n ,A) = d(X n , X^) — > implies that ^ n /i 
(Billingsley jSJ, theorem 4.1). 

Now, assume that the doubling condition (6) is satisfied and let us verify that T c (fi n , /i) — > 
implies (b'). Since /U n and since C(d(-,o)) is continuous on M, weak convergence 

holds: C(d(X n ,a) — ► C(d(X, a)). In order to verify (b'), it thus suffices to check that 
the sequence (C(d(X n ,a))) is uniformly integrable. The uniform integrability is equivalent 
to the pair of conditions: (i) (EC(d(X n , a))) is uniformly bounded and (ii) for A £ J 7 , 
(EC(d(X n , o))1a) is uniformly continuous, (i.e. sup n EC(d(X n , o))1a — > as P(A) — > 0). 
Together (6) and (20) yield the inequalities 

C(d(x, a)) < AC [\d{x, a) J < XC(d(x, y)) + \C{d{y, a)), (25) 



2 

for all x,y £ M and the positive constant A. Then 

BC(d(X n , X)) > -EC(d(X n , a)) - BC(d(X, a)). (26) 
A 

Suppose that (EC(d(X n ,a))) is not uniformly bounded. Then there exists a subsequence 
(EC (d(X n i , a))) such that EC(d(A n /,a)) — > +00. Applying (26) to this subsequence, we 
come to the following contradiction: 

<- EC(d(X n , X)) > ~~EC(d(X n >,a)) - BC(d(X, a)) -» +00. 
A 
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Thus, (EC(d(X n , a))) is uniformly bounded. 

Let e be fixed, and let i 6 f. Since T c (/x n ,/z) — > 0, we can choose a positive integer N 
such that EC(d(X n ,X))lA < e, for all n> N. By applying once again the inequality (26), 
we obtain 

sup EC(d(X n ,a))l A < AsupEC(d(X„, X))1 A + XEC(d(X,a))l A . 

n>N n 

Let P(A) — > 0. Since (C(d(X n ,X))) is uniformly integrable and since EC(d(X, a)) < 
EC(2d(X,a)) < oo, 

sup EC(d(X n ,a))l A ->0, 

n>iV 

i.e. (EC(d(X n ,a))lA) is uniformly continuous. Hence, the sequence (C(d(X n , a))) is uni- 
formly integrable and (b') J C(d(x,a))d/j, n — > J C(d(x,a))d(i holds. 

Note that from (6) and since C is non-decreasing, the following two inequalities hold true 

C(2d(x, a)) < XC(d(x, a)), C(d(x, a)) < C(2d(x, a)), 

for any x £ M. This implies that 

(j C(d(x, a))(j, n {dx) < oo) (f C(2d(x, a))/i n (dx) < oo) and that (j C(d(x,a))fi(dx) < 
oo) -4=>- (J* C(2d(x,a))/j,(dx) < oo). Therefore, in the setting of the theorem, the sequences 
(C(d(X n ,a))) and (C(2d(X n ,a))) are both either uniformly integrable or not, and (b)<^=> 
(b')- 

This observation completes the proof of Theorem 2 and of Corollary 1. □ 



Proof of Corollary 2. Let K\ and K2 be the respective supports of fi and za If fi and ^ 

are absolutely continuous with respective densities f± and /2, then 

y (j)(x)dfj, — J 4>{x)dv = j 4>{x) fi(x)dx — y 4>{x)f2{x)d. 

< I \cp{x)\\h(x) - f 2 {x)\dx 

JK 1 LiK 2 



where = sup{|0(x)| : x G (ifi U if 2 )} (here A = A U <M). 

To prove the result in the general case, define the partition (A m ) me z of M, A m G S(M), 
as follows: 

A m = {x £ M : m - 1 < 0(x) < m}. 



Thus, 



/P I OO /j 

4>{x)dn - I 4>{x)dv = / - ") 



m=— 00 
+00 



< |m||/x(A m ) - v(A m )\ 

m=—oo 



(27) 
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where is defined as above, and where we used the dual definition of the total variation 
distance. 

Let fi n and [i be probability measures on M with bounded supports respectively denoted 
K n and K. Let also U n K n be bounded and \\/j, n — h\\tv — * 0. Convergence in total variation 
implies weak convergence /x n \i. All the conditions of Theorem 2 are satisfied. Therefore, 
to prove the convergence of jJL n to [i in T c , it suffices to check that J C(2d(x,a))dfi n — ► 
J C(2d(x,a))diJ, for some a G M. The inequality (27) yields for any n 

< L^\\n n — [i\\tv, (28) 



where = sup{\<p(x)\ : x G Ui^ n } < oo does not depend on n. By fixing a G M and 
applying (28) to </>(x) = C(2d(x, a)), we obtain that the convergence in total variation implies 
the convergence of the integrals f (f)(x)dfi n — > J 4>(x)dfj,. This completes the proof. □ 



(p(x)dfi n - / (f>(x)dn 



Proof of Theorem 3. (i) First, we set Y n = S n /(a^/n) and check that E|y„| p -> E|Z| p . 

The classical CLT gives y„ —> Z, while the uniform boundedness of E|y n | p follows from 
Rosenthal's inequality (7): 

E|y "' " (^FT + V - + V (29) 

Let Z ~ iV(0, 1) and sequence (X n ) be independent. We fix e > and choose a compact set 
K, K G B(R), such that 

7 (K C )<£, E|Z| p l {ZeXc} < e, 7 (<9K) = 0. (30) 

Hence, 

|E|y n | p - e|z| p | < |E|y n | p i {ZGK} - E|z| p i {zex} | + E|y n | p i {ZG ^ c} + E\z\n {ZeKC} 

for sufficiently large n, thanks to (29), (30) and to the convergence of E|y n | p l j^gi^} to 
E\Z\ p l{ ZeK y. Therefore, E|y„| p — ► E|Z| P . This, in particular, implies the uniform integra- 
bility of (\Y n \P). 

Next, we show that for a cost function with C{x) = 0(x p ), at infinity, all the conditions 
of Theorem 2 are satisfied. We have EC(2|Z|) < +oo, while the finiteness of J C(2\x\)(i n (dx) 
follows from (29) and inequality 

C(2\Y n \) = C(2|y n |)l { | y „|< a;o} + C(2|y n |)l { | yn | >:ro} < C(x )l {lYn \< Xo} + P\Y n \n 

(32) 

with a positive constant (3 such that C(x) < /3x p , for all x > xq. The CLT provides the 
weak convergence [i n ==> 7, while we obtain EC(2|y n |) — > EC(2|Z|) from the uniform 
integrability of (C(2|y n |) which follows from (32) and uniform integrability of (|y n | p ). Thus, 
applying Theorem 2, we obtain that T c (fi ni ~/) — > 0. 
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Remark 2. Since E\Y n \P -» E\Z\p, the convergence of EC(|F n |) to EC(|Z|) follows from 
part (a) and part (c) of the result of Bickel and Freedman cited above. Then, EC(2|Y n |) — » 
EC(2|Z|) is implied by the doubling condition (6). 

(ii) Once again, we check that the conditions of Theorem 2 are satisfied. First, BC(4y/2\Z\) < 
oo implies that EC(2|Z|) < oo and that C(2x) = o{e x2 / 16 ). 
The function f(x) = exp(x 2 /16) has the expansion 

Then Stirling's formula yields, for a generic term of series (33), starting at some index ko 

\f k ' 
2^+i(k-2) k ~h 2k+ 



fk = „ , . i', I (34) 



Next, the Rosenthal inequality (7) can be written, with constants, in the following form 
(Petrov QH, inequality (2.35)): 



E|5 n | fc < k*riE\X 1 \ k + -^-e le n : 5(T K . (35) 



2 

Together (33)-(35) imply, for Y n = S n /(cry/n), 

E/(|y„D < m + y 6 (fc 2) 3 |Al1 — + y — — {k 2) 3 — 

^ 2 2k + 1 Vk(k-2) k -la 2k n k - 1 ^ o 2^> k Vk(k - 2) k ~ 2 k k ~^ 

k=ko k=kg 

= M + PxQi + P2Q2 < +00 

for some positive constants M, fa, and fa. 

Since C(2x) = o(/(x)), there exists xo > such that C(2x) < f(x), for all x > xq. The 
inequality (36) gives 

EC(2|r n |) = EC(2|y n |)l { | yn |< Xo} + EC(2|y n |)l { | Yn | >a;o} 
< C(2x ) + M + faQ l + faQ 2 - 

Therefore, (EC(2|y^.|)) is bounded and, moreover, uniformly bounded. Next, we check that 
(EiC(2\Y n I A E T, is uniformly continuous. 

Fix e > 0, and choose xi positive and such that C(2x) < e\f(x), for all x > x%, with 
£1 = 2 (M+/3iQi+feQ2) - If P ( A ^ = 2C(2xi) ' and in com P lete similarity to (37) we have 

supEC(2|Y n |)l A < C(2xi)P(A) + £l supE/(|Y n |)l A < e. (38) 

n n 

The inequalities (37) and (38) yield the uniform integrability of (C(2|Y n |)), while the classical 
CLT provides [i n ==r- 7. Hence, EC (2| 5^ | ) — > EC(2|Z|) and all the conditions of Theorem 2 
are satisfied. Then T c (// n ,7) — > 0. This concludes the proof of Theorem 3. □ 
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Proof of Theorems 4 and 5. They are carried out by using the same arguments as in 
the proof of Theorem 3. 

For sequences of dependent random variables we assume additional conditions of asymp- 
totic independence, which yield the moment inequalities (9) and (11) for the moments of the 
sums: the condition (8) for strongly mixing sequences and the condition (10) for associated 
sequences. We also use the bounds on K (p) in (9) and (11) derived by Doukhan and Louhichi 
in HQ]. 
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